forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync up with Linus #75
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The lock_class iteration of /proc/lock_stat is not serialized against the lockdep_free_key_range() call from module unload. Therefore it can happen that we find a class of which ->name/->key are no longer valid. There is a further bug in zap_class() that left ->name dangling. Cure this. Use RCU_INIT_POINTER() because NULL. Since lockdep_free_key_range() is rcu_sched serialized, we can read both ->name and ->key under rcu_read_lock_sched() (preempt-disable) and be assured that if we observe a !NULL value it stays safe to use for as long as we hold that lock. If we observe both NULL, skip the entry. Reported-by: Jerome Marchand <[email protected]> Tested-by: Jerome Marchand <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
CBOX counters are increased to 48b on HSX. Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and HSWEP_U_MSR_PMON_CTL0. See specification in: http://www.intel.com/content/www/us/en/processors/xeon/ xeon-e5-v3-uncore-performance-monitoring.html Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
Fixes: 6058bb3 'ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller' Signed-off-by: Axel Lin <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Carlo Caione <[email protected]> Cc: Jason Cooper <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
This patch adds native DSD support for the XMOS based JLsounds I2SoverUSB board Signed-off-by: Jurgen Kramer <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
In the commit below, I missed the connector allocation in the function intel_sdvo_analog_init(), leading to those connectors to have a NULL state pointer. commit 08d9bc9 Author: Ander Conselvan de Oliveira <[email protected]> Date: Fri Apr 10 10:59:10 2015 +0300 drm/i915: Allocate connector state together with the connectors Reported-by: Stefan Lippers-Hollmann <[email protected]> Tested-by: Stefan Lippers-Hollmann <[email protected]> Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
Using _bh variant for spin locks causes this kind of warning: Starting logging: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3 at /ssd_drive/linux/kernel/softirq.c:151 __local_bh_enable_ip+0xe8/0xf4() Modules linked in: CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.0-rc2+ #94 Hardware name: Atmel SAMA5 [<c0013c04>] (unwind_backtrace) from [<c00118a4>] (show_stack+0x10/0x14) [<c00118a4>] (show_stack) from [<c001bbcc>] (warn_slowpath_common+0x80/0xac) [<c001bbcc>] (warn_slowpath_common) from [<c001bc14>] (warn_slowpath_null+0x1c/0x24) [<c001bc14>] (warn_slowpath_null) from [<c001e28c>] (__local_bh_enable_ip+0xe8/0xf4) [<c001e28c>] (__local_bh_enable_ip) from [<c01fdbd0>] (at_xdmac_device_terminate_all+0xf4/0x100) [<c01fdbd0>] (at_xdmac_device_terminate_all) from [<c02221a4>] (atmel_complete_tx_dma+0x34/0xf4) [<c02221a4>] (atmel_complete_tx_dma) from [<c01fe4ac>] (at_xdmac_tasklet+0x14c/0x1ac) [<c01fe4ac>] (at_xdmac_tasklet) from [<c001de58>] (tasklet_action+0x68/0xb4) [<c001de58>] (tasklet_action) from [<c001dfdc>] (__do_softirq+0xfc/0x238) [<c001dfdc>] (__do_softirq) from [<c001e140>] (run_ksoftirqd+0x28/0x34) [<c001e140>] (run_ksoftirqd) from [<c0033a3c>] (smpboot_thread_fn+0x138/0x18c) [<c0033a3c>] (smpboot_thread_fn) from [<c0030e7c>] (kthread+0xdc/0xf0) [<c0030e7c>] (kthread) from [<c000f480>] (ret_from_fork+0x14/0x34) ---[ end trace b57b14a99c1d8812 ]--- It comes from the fact that devices can called some code from the DMA controller with irq disabled. _bh variant is not intended to be used in this case since it can enable irqs. Switch to irqsave/irqrestore variant to avoid this situation. Signed-off-by: Ludovic Desroches <[email protected]> Cc: [email protected] # 4.0 and later Signed-off-by: Vinod Koul <[email protected]>
Rework slave configuration part in order to more report wrong errors about the configuration. Only maxburst and addr width values are checked when doing the slave configuration. The validity of the channel configuration is done at prepare time. Signed-off-by: Ludovic Desroches <[email protected]> Cc: [email protected] # 4.0 and later Signed-off-by: Vinod Koul <[email protected]>
The MW regbase and vbase(s) were not being freed if an error occurred in the vbase allocation loop. This is corrected by updating the error path for the allocation loop to err4. Reported-by: Julia Lawall <[email protected]> Signed-off-by: Jon Mason <[email protected]>
The new regmap code seems to cache this, which isn't helpful for the hotplug dock situation where this gets updated. Use the uncached query for this. Signed-off-by: Dave Airlie <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
Passive DP->DVI/HDMI dongles on DP++ ports show up to the system as HDMI devices, as they do not have a sink device in them to respond to any AUX traffic. When probing these dongles over the DDC, sometimes they will NAK the first attempt even though the transaction is valid and they support the DDC protocol. The retry loop inside of drm_do_probe_ddc_edid() would normally catch this case and try the transaction again, resulting in success. That, however, was thwarted by the fix for [1]: commit 9292f37 Author: Eugeni Dodonov <[email protected]> Date: Thu Jan 5 09:34:28 2012 -0200 drm: give up on edid retries when i2c bus is not responding This added code to exit immediately if the return code from the i2c_transfer function was -ENXIO in order to reduce the amount of time spent in waiting for unresponsive or disconnected devices. That was possible because the underlying i2c bit banging algorithm had retries of its own (which, of course, were part of the reason for the bug the commit fixes). Since its introduction in commit f899fc6 Author: Chris Wilson <[email protected]> Date: Tue Jul 20 15:44:45 2010 -0700 drm/i915: use GMBUS to manage i2c links we've been flipping back and forth enabling the GMBUS transfers, but we've settled since then. The GMBUS implementation does not do any retries, however, bailing out of the drm_do_probe_ddc_edid() retry loop on first encounter of -ENXIO. This, combined with Eugeni's commit, broke the retry on -ENXIO. Retry GMBUS once on -ENXIO on first message to mitigate the issues with passive adapters. This patch is based on the work, and commit message, by Todd Previte <[email protected]>. [1] https://bugs.freedesktop.org/show_bug.cgi?id=41059 v2: Don't retry if using bit banging. v3: Move retry within gmbux_xfer, retry only on first message. v4: Initialize GMBUS0 on retry (Ville). v5: Take index reads into account (Ville). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85924 Cc: Todd Previte <[email protected]> Cc: [email protected] Tested-by: Oliver Grafe <[email protected]> (v2) Tested-by: Jim Bride <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
Commit be0c37c ("MIPS: Rearrange PTE bits into fixed positions.") rearranged the PTE bits into fixed positions in preparation for the XPA support. However, this patch broke R6 since it only took R2 cores into consideration for the RI/XI bits leading to boot failures. We fix this by adding the missing CONFIG_CPU_MIPSR6 definitions Fixes: be0c37c ("MIPS: Rearrange PTE bits into fixed positions.") Cc: Steven J. Hill <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/10208/ Signed-off-by: Markos Chandras <[email protected]> Signed-off-by: Ralf Baechle <[email protected]>
…nitialization" This reverts commit c05199e. Vince Weaver reported the following crash while perf fuzzing: [ 79.473121] kernel BUG at mm/vmalloc.c:1335! [ 79.694391] Call Trace: [ 79.696997] <IRQ> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50 [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.712414] [<ffffffff810635e5>] __ioremap_caller+0x195/0x350 [ 79.718610] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.725462] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0 [ 79.732346] [<ffffffff810637b7>] ioremap_nocache+0x17/0x20 [ 79.738283] [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90 [ 79.744945] [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110 [ 79.752020] [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60 [ 79.758832] [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330 [ 79.765519] [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0 [ 79.771481] [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90 [ 79.777858] [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0 [ 79.784418] [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30 [ 79.790820] [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40 [ 79.797546] [<ffffffff8115ef80>] remote_function+0x50/0x60 [ 79.803535] [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180 [ 79.810840] [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60 [ 79.819328] [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0 [ 79.827614] [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80 [ 79.835465] <EOI> [ 79.837543] [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160 [ 79.844377] [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160 [ 79.851015] [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20 [ 79.856791] [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440 [ 79.863165] [<ffffffff816c9ddb>] rest_init+0xbb/0xd0 The offending commit is clearly confused as it moves heavy initialization work into IPI context. Revert it. Reported-by: Vince Weaver <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Kan Liang <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Yan, Zheng <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
…ister The existing hardware implementations with PASID support advertised in bit 28? Forget them. They do not exist. Bit 28 means nothing. When we have something that works, it'll use bit 40. Do not attempt to infer anything meaningful from bit 28. This will be reflected in an updated VT-d spec in the extremely near future. Signed-off-by: David Woodhouse <[email protected]>
Dan Carpenter reported missing brackets which resulted in reading a wrong crystalfreq value. I also noticed that the result of this function is ignored. Reported-By: Dan Carpenter <[email protected]> Signed-off-by: Hauke Mehrtens <[email protected]> Signed-off-by: Michael Buesch <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/10536/ Signed-off-by: Ralf Baechle <[email protected]>
Until recently, mac80211 overwrote all the statistics it could provide when getting called, but it now relies on the struct having been zeroed by the caller. This was always the case in nl80211, but wext used a static struct which could even cause values from one device leak to another. Using a static struct is OK (as even documented in a comment) since the whole usage of this function and its return value is always locked under RTNL. Not clearing the struct for calling the driver has always been wrong though, since drivers were free to only fill values they could report, so calling this for one device and then for another would always have leaked values from one to the other. Fix this by initializing the structure in question before the driver method call. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691 Cc: [email protected] Reported-by: Gerrit Renker <[email protected]> Reported-by: Alexander Kaltsas <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Now blk_cleanup_queue() can be called before calling del_gendisk()[1], inside which hctx->ctxs is touched from blk_mq_unregister_hctx(), but the variable has been freed by blk_cleanup_queue() at that time. So this patch moves freeing of hctx->ctxs into queue's release handler for fixing the oops reported by Stefan. [1], 6cd18e7 (block: destroy bdi before blockdev is unregistered) Reported-by: Stefan Seyfried <[email protected]> Cc: NeilBrown <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: [email protected] (v4.0) Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Along with the transition to regmap for managing the cached parameter reads, the caps overwrite was also moved to regmap cache. The cache change itself works, but it still tries to write the non-existing verb (the HDA parameter is read-only) wrongly. It's harmless in most cases, but some chips are picky and may result in the codec communication stall. This patch avoids it just by adding the missing flag check in reg_write ops. Signed-off-by: Takashi Iwai <[email protected]>
…odule. If CONFIG_MTD_PHYSMAP is set to m, the Cobalt mtd.ko module might get unloaded while the drivers/mtd modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <[email protected]>
If CONFIG_SERIAL_8250 is set to m, the Loongson seria.ko module might get unloaded while the serial driver modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <[email protected]> Reported-by: Paul Gortmaker <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/10538/
Due to the slightly odd way that new threads and processes start execution when scheduled for the very first time they were bypassing the required disable_msa call. Signed-off-by: Ralf Baechle <[email protected]>
Fix include/asm-generic/io.h: In function 'readb': include/asm-generic/io.h:113:2: error: implicit declaration of function 'bfin_read8' include/asm-generic/io.h: In function 'readw': include/asm-generic/io.h:121:2: error: implicit declaration of function 'bfin_read16' include/asm-generic/io.h: In function 'readl': include/asm-generic/io.h:129:2: error: implicit declaration of function 'bfin_read32' include/asm-generic/io.h: In function 'writeb': include/asm-generic/io.h:147:2: error: implicit declaration of function 'bfin_write8' include/asm-generic/io.h: In function 'writew': include/asm-generic/io.h:155:2: error: implicit declaration of function 'bfin_write16' include/asm-generic/io.h: In function 'writel': include/asm-generic/io.h:163:2: error: implicit declaration of function 'bfin_write32' Reported-by: Geert Uytterhoeven <[email protected]> Fixes: 1a3372b ("blackfin: io: define __raw_readx/writex with bfin_readx/writex") Cc: Steven Miao <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
The latest version of modinfo fails to compile score architecture targets with the following error. FATAL: The relocation at __ex_table+0x634 references section "__ex_table" which is not executable, IOW the kernel will fault if it ever tries to jump to it. Something is seriously wrong and should be fixed. The probem is caused by a bad label in an __ex_table entry. Acked-by: Lennox Wu <[email protected]> Cc: Quentin Casasnovas <[email protected]> Signed-off-by: Guenter Roeck <[email protected]>
This reverts commit 0243508. It introduces new regressions. Signed-off-by: David S. Miller <[email protected]>
Izumi found the following oops when hot re-adding a node: BUG: unable to handle kernel paging request at ffffc90008963690 IP: __wake_up_bit+0x20/0x70 Oops: 0000 [#1] SMP CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80 Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015 task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000 RIP: 0010:[<ffffffff810dff80>] [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP: 0018:ffff880017b97be8 EFLAGS: 00010246 RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9 RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648 RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800 R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000 FS: 00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0 Call Trace: unlock_page+0x6d/0x70 generic_write_end+0x53/0xb0 xfs_vm_write_end+0x29/0x80 [xfs] generic_perform_write+0x10a/0x1e0 xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs] xfs_file_write_iter+0x79/0x120 [xfs] __vfs_write+0xd4/0x110 vfs_write+0xac/0x1c0 SyS_write+0x58/0xd0 system_call_fastpath+0x12/0x76 Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48 RIP [<ffffffff810dff80>] __wake_up_bit+0x20/0x70 RSP <ffff880017b97be8> CR2: ffffc90008963690 Reproduce method (re-add a node):: Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic) This seems an use-after-free problem, and the root cause is zone->wait_table was not set to *NULL* after free it in try_offline_node. When hot re-add a node, we will reuse the pgdat of it, so does the zone struct, and when add pages to the target zone, it will init the zone first (including the wait_table) if the zone is not initialized. The judgement of zone initialized is based on zone->wait_table: static inline bool zone_is_initialized(struct zone *zone) { return !!zone->wait_table; } so if we do not set the zone->wait_table to *NULL* after free it, the memory hotplug routine will skip the init of new zone when hot re-add the node, and the wait_table still points to the freed memory, then we will access the invalid address when trying to wake up the waiting people after the i/o operation with the page is done, such as mentioned above. Signed-off-by: Gu Zheng <[email protected]> Reported-by: Taku Izumi <[email protected]> Reviewed by: Yasuaki Ishimatsu <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Tang Chen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
When trimming memcg consumption excess (see memory.high), we call try_to_free_mem_cgroup_pages without checking if we are allowed to sleep in the current context, which can result in a deadlock. Fix this. Fixes: 241994e ("mm: memcontrol: default hierarchy interface for memory") Signed-off-by: Vladimir Davydov <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Clear zram disk io accounting when resetting the zram device. Otherwise the residual io accounting stat will affect the diskstat in the next zram active cycle. Signed-off-by: Weijie Yang <[email protected]> Acked-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Commit d5e616f ("checkpatch: add a few more --fix corrections") broke the GLOBAL_INITIALISERS test with bad parentheses and optional leading spaces. Fix it. Signed-off-by: Joe Perches <[email protected]> Reported-by: Bandan Das <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
On -rt, the VM_BUG_ON(!irqs_disabled()) triggers inside the memcg swapout path because the spin_lock_irq(&mapping->tree_lock) in the caller doesn't actually disable the hardware interrupts - which is fine, because on -rt the tophalves run in process context and so we are still safe from preemption while updating the statistics. Remove the VM_BUG_ON() but keep the comment of what we rely on. Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Clark Williams <[email protected]> Cc: Fernando Lopez-Lezcano <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
If zs_create_pool()->create_handle_cache()->kmem_cache_create() or pool->name allocation fails, zs_create_pool()->destroy_handle_cache() will dereference the NULL pool->handle_cachep. Modify destroy_handle_cache() to avoid this. Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Jovi Zhangwei reported the following problem Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages with GFP_COMP flag. [Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping: (null) index:0x0 [Mon May 25 05:29:33 2015] flags: 0x20047580004000(head) [Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page)) [Mon May 25 05:29:33 2015] ------------[ cut here ]------------ [Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661! [Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP In this case it was triggered by running tcpdump but it's not necessary reproducible on all systems. sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap Compound pages cannot be migrated and it was not expected that such pages be marked for NUMA balancing. This did not take into account that drivers such as net/packet/af_packet.c may insert compound pages into userspace with vm_insert_page. This patch tells the NUMA balancing protection scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that compound pages are marked for migration. Signed-off-by: Mel Gorman <[email protected]> Reported-by: Jovi Zhangwei <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
We saw excessive direct memory compaction triggered by skb_page_frag_refill. This causes performance issues and add latency. Commit 5640f76 introduces the order-3 allocation. According to the changelog, the order-3 allocation isn't a must-have but to improve performance. But direct memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of direct memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, direct memory compaction will not be triggered, skb_page_frag_refill will fallback to order-0 immediately, hence the direct memory compaction overhead is avoided. In the allocation failure case, kswapd is waken up and doing compaction, so chances are allocation could success next time. alloc_skb_with_frags is the same. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric V2: make the changelog clearer Cc: Eric Dumazet <[email protected]> Cc: Chris Mason <[email protected]> Cc: Debabrata Banerjee <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Pull drm fixes from Dave Airlie: "i915 and radeon fixes: i915: fix for connector oops regression DDC probing fix radeon: two radeon reverts, along with a freeze workaround and a fix" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon: Make sure radeon_vm_bo_set_addr always unreserves the BO Revert "drm/radeon: adjust pll when audio is not enabled" Revert "drm/radeon: don't share plls if monitors differ in audio support" drm/radeon: fix freeze for laptop with Turks/Thames GPU. drm/i915: Fix DDC probe for passive adapters drm/i915: Properly initialize SDVO analog connectors
The previous patch tried to continue the probe if i915 binding fails. For for simplicity reason, we haven't implemented abort even for controller chips that are dedicated for HDMI/DP on HSW and BDW. However, Mengdong suggested that this can be dangerous; BIOS may disable gfx power well although the PCI entry for HD-audio is left, and this may result in the unexpected behavior, kernel errors, etc. For avoiding this situation, abort the probe at i915 binding failure only for HSW/BDW chips selectively. For other chips, it still continues. Fixes: bf06848 ('ALSA: hda - Continue probing even if i915 binding fails') Reported-by: Mengdong Lin <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
Some drivers implement only pause operation (no resuming). Example is pl330 where pause is needed for getting residuum. pl330 does not support resume operation, transfer must be stopped after pause. However for slaves this is exposed always as "pause and resume" which introduces subtle errors on Odroid U3 board (Exynos4412 with pl330). After adding pause function to pl330 driver the audio playback (utilizing DMA) gets choppy after some time (approximately 24 hours). Fix this by exposing "cmd_pause" if and only if pause and resume are implemented. Signed-off-by: Krzysztof Kozlowski <[email protected]> Reported-by: [email protected] Reported-by: Marek Szyprowski <[email protected]> Cc: <[email protected]> Fixes: 88987d2 ("dmaengine: pl330: add DMA_PAUSE feature") Acked-by: Maxime Ripard <[email protected]> Signed-off-by: Vinod Koul <[email protected]>
Returning zero from a 'store' function is bad. The return value should be either len length of the string or an error. So use 'len' if 'err' is zero. Fixes: 6791875 ("md: make reconfig_mutex optional for writes to md sysfs files.") Signed-off-by: NeilBrown <[email protected]> Cc: [email protected] (v4.0+)
Checking ->sync_thread without holding the mddev_lock() isn't really safe, even after flushing the workqueue which ensures md_start_sync() has been run. While this code is waiting for the lock, md_check_recovery could reap the thread itself, and then start another thread (e.g. recovery might finish, then reshape starts). When this thread gets the lock md_start_sync() hasn't run so it doesn't get reaped, but MD_RECOVERY_RUNNING gets cleared. This allows two threads to start which leads to confusion. So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do the flush and the test and the reap all under the mddev_lock to avoid any race with md_check_recovery. Signed-off-by: NeilBrown <[email protected]> Fixes: 6791875 ("md: make reconfig_mutex optional for writes to md sysfs files.") Cc: [email protected] (v4.0+)
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a resync etc finished. However it is possible for raid5_start_reshape to race and start a reshape before MD_RECOVERY_DONE is cleared. This can lean to multiple reshapes running at the same time, which isn't good. To make sure it is cleared before starting a reshape, and also clear it when reaping a thread, just to be safe. Signed-off-by: NeilBrown <[email protected]>
Although the extended tables are theoretically a completely orthogonal feature to PASID and anything else that *uses* the newly-available bits, some of the early hardware has problems even when all we do is enable them and use only the same bits that were in the old context tables. For now, there's no motivation to support extended tables unless we're going to use PASID support to do SVM. So just don't use them unless PASID support is advertised too. Also add a command-line bailout just in case later chips also have issues. The equivalent problem for PASID support has already been fixed with the upcoming VT-d spec update and commit bd00c60 ("iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register"), because the problematic platforms use the old definition of the PASID-capable bit, which is now marked as reserved and meaningless. So with this change, we'll magically start using ECS again only when we see the new hardware advertising "hey, we have PASID support and we actually tested it this time" on bit 40. The VT-d hardware architect has promised that we are not going to have any reason to support ECS *without* PASID any time soon, and he'll make sure he checks with us before changing that. In the future, if hypothetical new features also use new bits in the context tables and can be seen on implementations *without* PASID support, we might need to add their feature bits to the ecs_enabled() macro. Signed-off-by: David Woodhouse <[email protected]>
Pull VT-d hardware workarounds from David Woodhouse: "This contains a workaround for hardware issues which I *thought* were never going to be seen on production hardware. I'm glad I checked that before the 4.1 release... Firstly, PASID support is so broken on existing chips that we're just going to declare the old capability bit 28 as 'reserved' and change the VT-d spec to move PASID support to another bit. So any existing hardware doesn't support SVM; it only sets that (now) meaningless bit 28. That patch *wasn't* imperative for 4.1 because we don't have PASID support yet. But *even* the extended context tables are broken — if you just enable the wider tables and use none of the new bits in them, which is precisely what 4.1 does, you find that translations don't work. It's this problem which I thought was caught in time to be fixed before production, but wasn't. To avoid triggering this issue, we now *only* enable the extended context tables on hardware which also advertises "we have PASID support and we actually tested it this time" with the new PASID feature bit. In addition, I've added an 'intel_iommu=ecs_off' command line parameter to allow us to disable it manually if we need to" * git://git.infradead.org/intel-iommu: iommu/vt-d: Only enable extended context tables if PASID is supported iommu/vt-d: Change PASID support to bit 40 of Extended Capability Register
Pull three more md fixes from Neil Brown: "Hasn't been a good cycle for md has it :-( The main issue fixed here is a rare race which can result in two reshape threads running at once, which doesn't end well. Also a minor issue with a write to a sysfs file returning the wrong value. Backports to 4.0-stable are indicated" * tag 'md/4.1-rc7-fixes' of git://neil.brown.name/md: md: make sure MD_RECOVERY_DONE is clear before starting recovery/resync md: Close race when setting 'action' to 'idle'. md: don't return 0 from array_state_store
Pull block layer fixes from Jens Axboe: "Remember about a week ago when I sent the last pull request for 4.1? Well, I lied. Now, I don't want to shift the blame, but Dan, Ming, and Richard made a liar out of me. Here are three small patches that should go into 4.1. More specifically, this pull request contains: - A Kconfig dependency for the pmem block driver, so it can't be selected if HAS_IOMEM isn't availble. From Richard Weinberger. - A fix for genhd, making the ext_devt_lock softirq safe. This makes lockdep happier, since we also end up grabbing this lock on release off the softirq path. From Dan Williams. - A blk-mq software queue release fix from Ming Lei. Last two are headed to stable, first fixes an issue introduced in this cycle" * 'for-linus' of git://git.kernel.dk/linux-block: block: pmem: Add dependency on HAS_IOMEM block: fix ext_dev_lock lockdep report blk-mq: free hctx->ctxs in queue's release handler
Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Vlad Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
This patch fix URL (http to https) for wiki.wireshark.org. Signed-off-by: Masanari Iida <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Pull networking fixes from David Miller: 1) Fix uninitialized struct station_info in cfg80211_wireless_stats(), from Johannes Berg. 2) Revert commit attempt to fix ipv6 protocol resubmission, it adds regressions. 3) Endless loops can be created in bridge port lists, fix from Nikolay Aleksandrov. 4) Don't WARN_ON() if sk->sk_forward_alloc is non-zero in sk_clear_memalloc, it is a legal situation during swap deactivation. Fix from Mel Gorman. 5) Fix order of disabling interrupts and unlocking NAPI in enic driver to avoid a race. From Govindarajulu Varadarajan. 6) High and low register writes are swapped when programming the start of periodic output in igb driver. From Richard Cochran. 7) Fix device rename handling in mpls stack, from Robert Shearman. 8) Do not trigger compaction synchronously when optimistically trying to allocate an order 3 page in alloc_skb_with_frags() and skb_page_frag_refill(). From Shaohua Li. 9) Authentication with COOKIE_ECHO is not handled properly in SCTP, fix from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO net: don't wait for order-3 page allocation mpls: handle device renames for per-device sysctls net: igb: fix the start time for periodic output signals enic: fix memory leak in rq_clean enic: check return value for stat dump enic: unlock napi busy poll before unmasking intr net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap bridge: fix multicast router rlist endless loop tipc: disconnect socket directly after probe failure Revert "ipv6: Fix protocol resubmission" cfg80211: wext: clear sinfo struct before calling driver
The GIC chained handlers use do_IRQ() to call the subhandlers. This means that irq_enter() calls get nested, which leads to preempt count looking like we're in nested interrupts, which in turn leads to all system time being accounted as IRQ time in account_system_time(). Fix it by using generic_handle_irq(). Since these same functions are used in some systems (if cpu_has_veic) from a low-level vectored interrupt handler which does not go throught do_IRQ(), we need to do it conditionally. Signed-off-by: Rabin Vincent <[email protected]> Reviewed-by: Andrew Bresticker <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/10545/ Signed-off-by: Ralf Baechle <[email protected]>
This patch fixes mips compilation error: lib/mpi/generic_mpih-mul1.c: In function 'mpihelp_mul_1': lib/mpi/longlong.h:651:2: error: impossible constraint in 'asm' Signed-off-by: Jaedon Shin <[email protected]> Cc: Linux-MIPS <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/10546/ Signed-off-by: Ralf Baechle <[email protected]>
…l/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Most of commits are regression fixes for HD-audio: a few corner case fixes for regmap transition, and i915 binding issues. In addition, a quirk for another USB-audio device supporting DSD" * tag 'sound-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Abort the probe without i915 binding for HSW/BDW ALSA: hda - Re-add the lost fake mute support ALSA: hda - Continue probing even if i915 binding fails ALSA: hda - Don't actually write registers for caps overwrites ALSA: hda - fix number of devices query on hotplug ALSA: usb-audio: add native DSD support for JLsounds I2SoverUSB
…linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A regression fix for a crash, and a Intel HSW uncore PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization" perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP
…cm/linux/kernel/git/tip/tip Pull lockdep fix from Ingo Molnar: "A lockdep/modules unload race fix that can oops" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Fix a race between /proc/lock_stat and module unload
…inux/kernel/git/tip/tip Pull irqchip fix from Thomas Gleixner: "A single fix for an off by one bug in the sunxi irqchip driver" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: sunxi-nmi: Fix off-by-one error in irq iterator
…ream-linus Pull more MIPS fixes from Ralf Baechle: "Another round of 4.1 MIPS fixes, one fix to a MIPS-specific #if condition in lib/mpi, one fix to the MIPS GIC irqchip driver and one SSB fix. Details: - fix handling of clock in chipco SSB driver. - fix two MIPS-specific #if conditions to correctly work for GCC 5.1. - fix damage to R6 pgtable bits done by XPA support. - fix possible crash due to unloading modules that contain statically defined platform devices. - fix disabling of the MSA ASE on context switch to also work correctly when a new thread/process has the CPU for the very first time. This is part of linux-next and has been beaten to death on Imagination's test farm. While things are not looking too grim this pull request also means the rate of fixes for 4.1 remains nearly constant so I'd not be unhappy if you'd delay the release" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MPI: MIPS: Fix compilation error with GCC 5.1 IRQCHIP: mips-gic: Don't nest calls to do_IRQ() MIPS: MSA: bugfix - disable MSA correctly for new threads/processes. MIPS: Loongson: Do not register 8250 platform device from module. MIPS: Cobalt: Do not build MTD platform device registration code as module. SSB: Fix handling of ssb_pmu_get_alp_clock() MIPS: pgtable-bits: Fix XPA damage to R6 definitions.
Pull NTB fixes from Jon Mason: "I apologize for the tardiness of this request. Here are a couple of last minute NTB bug fixes for v4.1: NTB bug fixes to address issues in unmapping the MW reg base and vbase, and an uninitialized variable on Atom platforms" * tag 'ntb-4.1' of git://github.com/jonmason/ntb: ntb: initialize max_mw for Atom before using it ntb: iounmap MW reg and vbase in error path
Pull dmaengine fixes from Vinod Koul: "Here are hopefully last set of fixes for 4.1. This time we have: - fixing pause capability reporting on both dmaengine pause & resume support by Krzysztof - locking fix fir at_xdmac by Ludovic - slave configuration fix for at_xdmac by Ludovic" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: Fix choppy sound because of unimplemented resume dmaengine: at_xdmac: rework slave configuration part dmaengine: at_xdmac: lock fixes
dabrace
pushed a commit
that referenced
this pull request
Mar 23, 2017
Since the nfct and nfctinfo have been combined, the nf_conn structure must be at least 8 bytes aligned, as the 3 LSB bits are used for the nfctinfo. But there's a fake nf_conn structure to denote untracked connections, which is created by a PER_CPU construct. This does not guarantee that it will be 8 bytes aligned and can break the logic in determining the correct nfctinfo. I triggered this on a 32bit machine with the following error: BUG: unable to handle kernel NULL pointer dereference at 00000af4 IP: nf_ct_deliver_cached_events+0x1b/0xfb *pdpt = 0000000031962001 *pde = 0000000000000000 Oops: 0000 [#1] SMP [Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 crc_ccitt ppdev r8169 parport_pc parport OK ] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-test+ #75 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 task: c126ec00 task.stack: c1258000 EIP: nf_ct_deliver_cached_events+0x1b/0xfb EFLAGS: 00010202 CPU: 0 EAX: 0021cd01 EBX: 00000000 ECX: 27b0c767 EDX: 32bcb17a ESI: f34135c0 EDI: f34135c0 EBP: f2debd60 ESP: f2debd3c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 80050033 CR2: 00000af4 CR3: 309a0440 CR4: 001406f0 Call Trace: <SOFTIRQ> ? ipv6_skip_exthdr+0xac/0xcb ipv6_confirm+0x10c/0x119 [nf_conntrack_ipv6] nf_hook_slow+0x22/0xc7 nf_hook+0x9a/0xad [ipv6] ? ip6t_do_table+0x356/0x379 [ip6_tables] ? ip6_fragment+0x9e9/0x9e9 [ipv6] ip6_output+0xee/0x107 [ipv6] ? ip6_fragment+0x9e9/0x9e9 [ipv6] dst_output+0x36/0x4d [ipv6] NF_HOOK.constprop.37+0xb2/0xba [ipv6] ? icmp6_dst_alloc+0x2c/0xfd [ipv6] ? local_bh_enable+0x14/0x14 [ipv6] mld_sendpack+0x1c5/0x281 [ipv6] ? mark_held_locks+0x40/0x5c mld_ifc_timer_expire+0x1f6/0x21e [ipv6] call_timer_fn+0x135/0x283 ? detach_if_pending+0x55/0x55 ? mld_dad_timer_expire+0x3e/0x3e [ipv6] __run_timers+0x111/0x14b ? mld_dad_timer_expire+0x3e/0x3e [ipv6] run_timer_softirq+0x1c/0x36 __do_softirq+0x185/0x37c ? test_ti_thread_flag.constprop.19+0xd/0xd do_softirq_own_stack+0x22/0x28 </SOFTIRQ> irq_exit+0x5a/0xa4 smp_apic_timer_interrupt+0x2a/0x34 apic_timer_interrupt+0x37/0x3c By using DEFINE/DECLARE_PER_CPU_ALIGNED we can enforce at least 8 byte alignment as all cache line sizes are at least 8 bytes or more. Fixes: a9e419d ("netfilter: merge ctinfo into nfct pointer storage area") Signed-off-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
dabrace
pushed a commit
that referenced
this pull request
Sep 16, 2019
Driver copies FW commands to the HW queue as units of 16 bytes. Some of the command structures are not exact multiple of 16. So while copying the data from those structures, the stack out of bounds messages are reported by KASAN. The following error is reported. [ 1337.530155] ================================================================== [ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785 [ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75 [ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 1337.530542] Call Trace: [ 1337.530548] dump_stack+0x5b/0x90 [ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530560] print_address_description+0x65/0x22e [ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530577] __kasan_report.cold.3+0x37/0x77 [ 1337.530581] ? _raw_write_trylock+0x10/0xe0 [ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530590] kasan_report+0xe/0x20 [ 1337.530592] memcpy+0x1f/0x50 [ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re] [ 1337.530611] ? xas_create+0x3aa/0x5f0 [ 1337.530613] ? xas_start+0x77/0x110 [ 1337.530615] ? xas_clear_mark+0x34/0xd0 [ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re] [ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re] [ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0 [ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re] [ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re] [ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re] [ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core] [ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt] [ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt] [ 1337.530692] ? xa_load+0x87/0xe0 ... [ 1337.530840] do_syscall_64+0x6d/0x1f0 [ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1337.530845] RIP: 0033:0x7ff5b389035b [ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48 [ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b [ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8 [ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000 [ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50 [ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750 [ 1337.530885] The buggy address belongs to the page: [ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 1337.530964] flags: 0x57ffffc0000000() [ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000 [ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1337.530970] page dumped because: kasan: bad access detected [ 1337.530996] Memory state around the buggy address: [ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2 [ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 [ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00 [ 1337.531393] ^ [ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531691] ================================================================== Fix this by passing the exact size of each FW command to bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending the command to HW, modify the req->cmd_size to number of 16 byte units. Fixes: 1ac5a40 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Selvin Xavier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Doug Ledford <[email protected]>
dabrace
pushed a commit
that referenced
this pull request
Jan 2, 2020
…zones In case we have to migrate a ballon page to a newpage of another zone, the managed page count of both zones is wrong. Paired with memory offlining (which will adjust the managed page count), we can trigger kernel crashes and all kinds of different symptoms. One way to reproduce: 1. Start a QEMU guest with 4GB, no NUMA 2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL 3. Inflate the balloon to 1GB 4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it) 5. Observe /proc/zoneinfo Node 0, zone Normal pages free 16810 min 24848885473806 low 18471592959183339 high 36918337032892872 spanned 262144 present 262144 managed 18446744073709533486 6. Do anything that requires some memory (e.g., inflate the balloon some more). The OOM goes crazy and the system crashes [ 238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00 [ 238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G D W 5.4.0-next-20191204+ #75 [ 238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 238.341121] Call Trace: [ 238.341337] dump_stack+0x8f/0xd0 [ 238.341630] dump_header+0x61/0x5ea [ 238.341942] oom_kill_process.cold+0xb/0x10 [ 238.342299] out_of_memory+0x24d/0x5a0 [ 238.342625] __alloc_pages_slowpath+0xd12/0x1020 [ 238.343024] __alloc_pages_nodemask+0x391/0x410 [ 238.343407] pagecache_get_page+0xc3/0x3a0 [ 238.343757] filemap_fault+0x804/0xc30 [ 238.344083] ? ext4_filemap_fault+0x28/0x42 [ 238.344444] ext4_filemap_fault+0x30/0x42 [ 238.344789] __do_fault+0x37/0x1a0 [ 238.345087] __handle_mm_fault+0x104d/0x1ab0 [ 238.345450] handle_mm_fault+0x169/0x360 [ 238.345790] do_user_addr_fault+0x20d/0x490 [ 238.346154] do_page_fault+0x31/0x210 [ 238.346468] async_page_fault+0x43/0x50 [ 238.346797] RIP: 0033:0x7f47eba4197e [ 238.347110] Code: Bad RIP value. [ 238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293 [ 238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e [ 238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004 [ 238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033 [ 238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001 [ 238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0 [ 238.350878] Mem-Info: [ 238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0 [ 238.351085] active_file:12 inactive_file:7 isolated_file:0 [ 238.351085] unevictable:0 dirty:0 writeback:0 unstable:0 [ 238.351085] slab_reclaimable:5565 slab_unreclaimable:10170 [ 238.351085] mapped:3 shmem:111 pagetables:155 bounce:0 [ 238.351085] free:720717 free_pcp:2 free_cma:0 [ 238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss [ 238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB [ 238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884 [ 238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B [ 238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865 [ 238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB [ 238.364765] lowmem_reserve[]: 0 0 0 0 0 [ 238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B [ 238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B [ 238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B [ 238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 238.369915] 130 total pagecache pages [ 238.370241] 0 pages in swap cache [ 238.370533] Swap cache stats: add 0, delete 0, find 0/0 [ 238.370981] Free swap = 0kB [ 238.371239] Total swap = 0kB [ 238.371488] 1048445 pages RAM [ 238.371756] 0 pages HighMem/MovableOnly [ 238.372090] 306992 pages reserved [ 238.372376] 0 pages cma reserved [ 238.372661] 0 pages hwpoisoned In another instance (older kernel), I was able to observe this (negative page count :/): [ 180.896971] Offlined Pages 32768 [ 182.667462] Offlined Pages 32768 [ 184.408117] Offlined Pages 32768 [ 186.026321] Offlined Pages 32768 [ 187.684861] Offlined Pages 32768 [ 189.227013] Offlined Pages 32768 [ 190.830303] Offlined Pages 32768 [ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009 In another instance (older kernel), I was no longer able to start any process: [root@vm ~]# [ 214.348068] Offlined Pages 32768 [ 215.973009] Offlined Pages 32768 cat /proc/meminfo -bash: fork: Cannot allocate memory [root@vm ~]# cat /proc/meminfo -bash: fork: Cannot allocate memory Fix it by properly adjusting the managed page count when migrating if the zone changed. The managed page count of the zones now looks after unplug of the DIMM (and after deflating the balloon) just like before inflating the balloon (and plugging+onlining the DIMM). We'll temporarily modify the totalram page count. If this ever becomes a problem, we can fine tune by providing helpers that don't touch the totalram pages (e.g., adjust_zone_managed_page_count()). Please note that fixing up the managed page count is only necessary when we adjusted the managed page count when inflating - only if we don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the managed page count is not touched when inflating/deflating. Reported-by: Yumei Huang <[email protected]> Fixes: 3dcc057 ("mm: correctly update zone->managed_pages") Cc: <[email protected]> # v3.11+ Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Igor Mammedov <[email protected]> Cc: [email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
dabrace
pushed a commit
that referenced
this pull request
Feb 1, 2021
…abled When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT enabled as a Xen pv guest a warning is issued for each processor: [ 5.964347] ------------[ cut here ]------------ [ 5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90 [ 5.972321] Modules linked in: [ 5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.11.0-rc5-default #75 [ 5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013 [ 5.984313] RIP: e030:get_trap_addr+0x59/0x90 [ 5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48 [ 5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202 [ 5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38 [ 6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40 [ 6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000 [ 6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8 [ 6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0 [ 6.016316] FS: 0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000 [ 6.020313] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660 [ 6.028314] Call Trace: [ 6.032313] cvt_gate_to_trap.part.7+0x3f/0x90 [ 6.036313] ? asm_exc_double_fault+0x30/0x30 [ 6.040313] xen_convert_trap_info+0x87/0xd0 [ 6.044313] xen_pv_cpu_up+0x17a/0x450 [ 6.048313] bringup_cpu+0x2b/0xc0 [ 6.052313] ? cpus_read_trylock+0x50/0x50 [ 6.056313] cpuhp_invoke_callback+0x80/0x4c0 [ 6.060313] _cpu_up+0xa7/0x140 [ 6.064313] cpu_up+0x98/0xd0 [ 6.068313] bringup_nonboot_cpus+0x4f/0x60 [ 6.072313] smp_init+0x26/0x79 [ 6.076313] kernel_init_freeable+0x103/0x258 [ 6.080313] ? rest_init+0xd0/0xd0 [ 6.084313] kernel_init+0xa/0x110 [ 6.088313] ret_from_fork+0x1f/0x30 [ 6.092313] ---[ end trace be9ecf17dceeb4f3 ]--- Reason is that there is no Xen pv trap entry for X86_TRAP_VC. Fix that by adding a generic trap handler for unknown traps and wire all unknown bare metal handlers to this generic handler, which will just crash the system in case such a trap will ever happen. Fixes: 0786138 ("x86/sev-es: Add a Runtime #VC Exception Handler") Cc: <[email protected]> # v5.10 Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Andrew Cooper <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.